by Oded Loewenstein

Hi and welcome to my blog!
Today I’m going to try to analyze geographical differences in happy moments and to try to think how businesses could use such an analysis to their advantage. Happy moments is a corpus of 100,000 happy moments collected from various countries across the globe.

Businesses nowadays strive to optimize their operations. Using the happy moment database, businesses can learn the geographic differences in what makes people happy, and optimize their opeations accordingly: by understanding what makes people happy in each area in the world, a company can market specific products or even produce a new product or service that specifically caters to the needs and desires in each area.

To do so, we first need to define a subject for each happy moment. A very intuitive technique for such an analysis is the “tf-idf” technique, because it measures the importance of word to a specific document. Therefore, the most important word would define the document best. This “importance score” is computed by multiplying two different measures: (1) Term frequency - the number of times a word occurs in a document (2) Inverse document frequency - because some words are more common than others (for example, “the” is a very common word but it’s unlikely that it is the subject of a document), this measure decreases the weight of words that occur frequently across many documents: By multiplying the two, we get the tf-idf function, which essentially provides the importance of a word to it’s document and in other words, tells us the “subject” or the “reason” for the happy moment.

After doing the statistical work and learning the most frequent subject of every happy moment, let’s look at some of the conclusions that could be infer from the results:

The 10 most common “subjects” of happy moments in USA


First, some measures can and should be done to merge similarities (for example, in the use example finace and fiancee are probably the same). Furthermore, they can be grouped with other similar subjects (some could say that boyfriend belongs with the fiancee group), this difference can also provide useful segmentation. For example, Explora, a online travel retailer, may advertise romantic destinations to places with many soon-to-be-married and a trip to disneyland to somebody who his happy moments are with his daughter.


The 10 most common “subjects” of happy moments in India

Let’s again step into Explora’s shoes: if people in India are primarily happy with from things related to the temple and shopping, by marketing mainly trips to sacred and religous places with many shopping shopping options. However, in the US Explora will mainly market trips with events or, as metioned eariler, romantic trips.By mapping the frequent happy moments across different parts of the globe, a company could easily optimize sales by understanding what makes most people happy in that country.


The most common “subject” of happy moments in 5 selected countries

Finally, further segmentations could be made to optimize even further the marketing in each area, like distinguishing between genders in each area. Let’s take Mamazon, an online merchandise retailer, as an example. The graphs below illustrate the 10 most frequent happy moment subjects among women and men the US.


Comparison between women’s and Men’s most common “subject” of happy moments

We can easily observe that for women, events is much more associated with happiness then for men. We can also observe that while for both gender the spouse has a similar frequency of importance, women’s other main sources of happiness are still family (daughter, grandchildren), while for men it is success and hobbies (money, work, guitar, fishing). Therefore, Mamazon could put a higher emphasise on advertising gifts for family members when the user is a women, and gifts for the actual user when the user is a male.

To conclude, we witnessed how text processing of happy moments could be a integral tool for businesses to optimze their marketing and services offered using the tf-idf function. It should be noted that with some modifications (combining similar words or “reasons” mentioned before is just the tip of the iceberg), we could reach even better results.

Reference: Akari Asai, Sara Evensen, Behzad Golshan, Alon Halevy, Vivian Li, Andrei Lopatenko, Daniela Stepanov, Yoshihiko Suhara, Wang-Chiew Tan, Yinzhan Xu, ``HappyDB: A Corpus of 100,000 Crowdsourced Happy Moments’’, LREC ’18, May 2018. (to appear)